An Improvement of HSMM-Based Speech Synthesis by Duration-Dependent State Transition Probabilities
نویسندگان
چکیده
In this paper, we propose an improvement of hidden semiMarkov model (HSMM) based speech synthesis system by durationdependent state transition probabilities. In traditional HMM algorithm, the probability of the duration of a state decreases exponentially with time, which does not provide an adequate representation of the temporal structure of speech. To overcome this limitation, HSMM, which models explicitly the state duration distribution, was proposed. However, there is still an inconsistency. Although HSMM has explicit state duration probability distributions, the state transition probabilities are duration-invariant. In this paper, we introduce duration-dependent state transition probabilities, which are able to characterize the timescale distortion at particular instant of an utterance more effectively, into HSMM based speech synthesis system. Correspondingly we improve forward-backward algorithm and re-derive parameter re-estimation formulae. Experimental results show that the proposed method improves the naturalness of the synthesized speech.
منابع مشابه
Explicit duration modelling in HMM-based speech synthesis using a hybrid hidden Markov model-multilayer perceptron
In HMM-based speech synthesis, it is important to correctly model duration because it has a significant effect on the perceptual quality of speech, such as rhythm. For this reason, hidden semi-Markov model (HSMM) is commonly used to explicitly model duration instead of using the implicit state duration model of HMM through its transition probabilities. The cost of using HSMM to improve duration...
متن کاملHidden semi-Markov model based speech synthesis
In the present paper, a hidden-semi Markov model (HSMM) based speech synthesis system is proposed. In a hidden Markov model (HMM) based speech synthesis system which we have proposed, rhythm and tempo are controlled by state duration probability distributions modeled by single Gaussian distributions. To synthesis speech, it constructs a sentence HMM corresponding to an arbitralily given text an...
متن کاملMLLR adaptation for hidden semi-Markov model based speech synthesis
This paper describes an extension of maximum likelihood linear regression (MLLR) to hidden semi-Markov model (HSMM) and presents an adaptation technique of phoneme/state duration for an HMM-based speech synthesis system using HSMMs. The HSMM-based MLLR technique can realize the simultaneous adaptation of output distributions and state duration distributions. We focus on describing mathematical ...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملA Bayesian approach to Hidden Semi-Markov Model based speech synthesis
This paper proposes a Bayesian approach to hidden semiMarkov model (HSMM) based speech synthesis. Recently, hidden Markov model (HMM) based speech synthesis based on the Bayesian approach was proposed. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. In the Bayesian approach, all processes for con...
متن کامل